Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18247
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Awaiting Approval, 1 New Failure, 3 Cancelled Jobs, 3 Unrelated FailuresAs of commit 2f28c1c with merge base eb92cec ( NEW FAILURE - The following job has failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Updates the LLaMA example LoRA linear implementation to call an nn.Linear module in forward() (instead of torch.nn.functional.linear), presumably to align with module-based patterns.
Changes:
- Replace
torch.nn.functional.linear(x, self.weight, self.bias)withself.linear(x). - Introduce
self.linear = nn.Linear(...)duringLoRALinearinitialization.
Comments suppressed due to low confidence (1)
examples/models/llama/lora.py:36
linearandweightare now undefined after switching toself.linear;bias = linear.bias ...andregister_parameter("weight", ...)will raise at init time. Either derivebias/weightfromself.linear(or remove the extraregister_parametercalls entirely) so construction works.
self.linear = nn.Linear(in_dim, out_dim, bias=use_bias)
bias = linear.bias if self.use_bias else None
self.register_parameter("weight", nn.Parameter(weight))
self.register_parameter(
"bias", nn.Parameter(bias) if bias is not None else None
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Updates the Llama LoRA implementation to call an nn.Linear submodule directly (instead of torch.nn.functional.linear) and simplifies quantization filtering accordingly.
Changes:
- Refactors
LoRALinearto wrap a realnn.Linearmodule and use it inforward. - Adds partial state-dict backward compatibility by remapping the legacy
weightkey. - Simplifies the 8da*w quantization
filter_fnto only considernn.Linearmodules.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies the 8da4w/8da8w quantization filter to target nn.Linear modules only. |
| examples/models/llama/lora.py | Refactors LoRALinear to delegate to an internal nn.Linear and updates forward/state-dict behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
45fd1f3 to
1ebb7d7
Compare
There was a problem hiding this comment.
Pull request overview
Updates the LLaMA LoRA implementation to wrap a real nn.Linear submodule (instead of calling torch.nn.functional.linear directly), enabling TorchAO quantization tooling to recognize and quantize the base linear layer consistently across LoRA and non-LoRA models.
Changes:
- Refactors
LoRALinearto containself.linear: nn.Linear, addsweight/biasproperties for backward-compatible access, and remaps old checkpoint keys on load. - Simplifies TorchAO 8da*xw quantization filtering to target
nn.Linearmodules only (no special-casing LoRA modules). - Makes XNNPACK constant serialization keys content-hash-based (removes tensor-name prefix) and updates the LoRA CI expectation string accordingly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies TorchAO quantization filter to match nn.Linear modules and apply group-size compatibility logic. |
| examples/models/llama/lora.py | Refactors LoRALinear to wrap an nn.Linear submodule; adds BC weight/bias accessors and state-dict key remapping. |
| backends/xnnpack/operators/node_visitor.py | Changes named constant key generation to be solely SHA256-based to stabilize dedup/indexing behavior. |
| .ci/scripts/test_lora.sh | Updates expected output prefix text for the quantized LoRA test case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR updates the LLaMA LoRA implementation to use nn.Linear submodules (instead of torch.nn.functional.linear) so TorchAO quantization can capture base + LoRA linears directly, while also adjusting serialization keying to support weight sharing and updating a LoRA CI expected output string.
Changes:
- Simplify TorchAO quantization filtering to target
nn.Linearmodules only (relying on LoRA’s internalnn.Linearsubmodules). - Refactor
LoRALinearto contain aself.linearsubmodule, addweight/biasproperties for compatibility, and remap legacy checkpoint keys during loading. - Change XNNPACK constant-data
named_keyto remove tensor-name components; update LoRA CI expected output.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| examples/models/llama/source_transformation/quantize.py | Simplifies TorchAO quantization module filter to match only nn.Linear. |
| examples/models/llama/lora.py | Refactors LoRA linear to wrap an nn.Linear submodule; adds BC accessors and state-dict key remapping. |
| backends/xnnpack/operators/node_visitor.py | Changes constant-data named_key generation to be content-hash only. |
| .ci/scripts/test_lora.sh | Updates expected quantized-LoRA output prefix string. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| f"Serializing constant data node {tensor} but tensor value has no bytes", | ||
| ) | ||
| sha256_hash = hashlib.sha256(bytes(array)) | ||
| named_key = tensor.name + "_" + sha256_hash.hexdigest() | ||
| named_key = sha256_hash.hexdigest() | ||
|
|
Summary
Update lora def to use nn.linear instead of torch.nn.functional.linear
executorch/examples/models/llama/static_attention.py
Line 1154 in eb92cec
Test plan
CI